Overview

Dataset statistics

Number of variables17
Number of observations539127
Missing cells124695
Missing cells (%)1.4%
Duplicate rows3826
Duplicate rows (%)0.7%
Total size in memory69.9 MiB
Average record size in memory136.0 B

Variable types

Numeric10
Categorical6
Unsupported1

Alerts

Dataset has 3826 (0.7%) duplicate rowsDuplicates
State has a high cardinality: 55 distinct values High cardinality
County has a high cardinality: 1808 distinct values High cardinality
Crop has a high cardinality: 283 distinct values High cardinality
State_Code is highly correlated with State_County_CodeHigh correlation
State_County_Code is highly correlated with State_CodeHigh correlation
Planted_Acres is highly correlated with Planted_and_Failed_AcresHigh correlation
Planted_and_Failed_Acres is highly correlated with Planted_AcresHigh correlation
State_Code is highly correlated with State_County_CodeHigh correlation
State_County_Code is highly correlated with State_CodeHigh correlation
Planted_Acres is highly correlated with Planted_and_Failed_AcresHigh correlation
Planted_and_Failed_Acres is highly correlated with Planted_AcresHigh correlation
State_Code is highly correlated with State_County_CodeHigh correlation
State_County_Code is highly correlated with State_CodeHigh correlation
Planted_Acres is highly correlated with Planted_and_Failed_AcresHigh correlation
Planted_and_Failed_Acres is highly correlated with Planted_AcresHigh correlation
Intended_Use is highly correlated with Irrigation_PracticeHigh correlation
Irrigation_Practice is highly correlated with Intended_UseHigh correlation
State_Code is highly correlated with State and 1 other fieldsHigh correlation
County_Code is highly correlated with StateHigh correlation
Crop_Code is highly correlated with Intended_Use and 1 other fieldsHigh correlation
State is highly correlated with State_Code and 3 other fieldsHigh correlation
State_County_Code is highly correlated with State_Code and 1 other fieldsHigh correlation
Intended_Use is highly correlated with Crop_Code and 1 other fieldsHigh correlation
Irrigation_Practice is highly correlated with Crop_Code and 2 other fieldsHigh correlation
Planted_Acres is highly correlated with Planted_and_Failed_AcresHigh correlation
Planted_and_Failed_Acres is highly correlated with Planted_AcresHigh correlation
Crop_Type has 98932 (18.4%) missing values Missing
Intended_Use has 25340 (4.7%) missing values Missing
Planted_Acres is highly skewed (γ1 = 54.10435986) Skewed
Volunteer_Acres is highly skewed (γ1 = 210.2360612) Skewed
Failed_Acres is highly skewed (γ1 = 138.0075859) Skewed
Prevented_Acres is highly skewed (γ1 = 72.64528079) Skewed
Not_Planted_Acres is highly skewed (γ1 = 131.4743886) Skewed
Planted_and_Failed_Acres is highly skewed (γ1 = 54.01257506) Skewed
Crop_Type is an unsupported type, check if it needs cleaning or further analysis Unsupported
Planted_Acres has 17658 (3.3%) zeros Zeros
Volunteer_Acres has 510568 (94.7%) zeros Zeros
Failed_Acres has 529907 (98.3%) zeros Zeros
Prevented_Acres has 521451 (96.7%) zeros Zeros
Not_Planted_Acres has 522162 (96.9%) zeros Zeros
Planted_and_Failed_Acres has 17116 (3.2%) zeros Zeros

Reproduction

Analysis started2022-05-23 15:10:30.753196
Analysis finished2022-05-23 15:11:55.214873
Duration1 minute and 24.46 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

State_Code
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct55
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.05325832
Minimum1
Maximum72
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:55.410466image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q119
median30
Q342
95-th percentile54
Maximum72
Range71
Interquartile range (IQR)23

Descriptive statistics

Standard deviation14.08137574
Coefficient of variation (CV)0.4534588799
Kurtosis-0.7979441515
Mean31.05325832
Median Absolute Deviation (MAD)11
Skewness-0.01330404755
Sum16741650
Variance198.2851426
MonotonicityNot monotonic
2022-05-23T11:11:55.680331image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4835657
 
6.6%
2726281
 
4.9%
3126132
 
4.8%
1925414
 
4.7%
2022637
 
4.2%
5522074
 
4.1%
1719790
 
3.7%
3918846
 
3.5%
2618214
 
3.4%
2918123
 
3.4%
Other values (45)305959
56.8%
ValueCountFrequency (%)
18923
1.7%
2301
 
0.1%
41488
 
0.3%
56054
 
1.1%
69513
1.8%
89760
1.8%
92172
 
0.4%
101056
 
0.2%
126537
 
1.2%
1316963
3.1%
ValueCountFrequency (%)
721320
 
0.2%
6933
 
< 0.1%
6015
 
< 0.1%
563397
 
0.6%
5522074
4.1%
543969
 
0.7%
537170
 
1.3%
52331
 
0.1%
5110874
2.0%
501700
 
0.3%

County_Code
Real number (ℝ≥0)

HIGH CORRELATION

Distinct272
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean90.68761906
Minimum1
Maximum810
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:55.913518image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q133
median75
Q3125
95-th percentile217
Maximum810
Range809
Interquartile range (IQR)92

Descriptive statistics

Standard deviation81.23338367
Coefficient of variation (CV)0.8957494365
Kurtosis9.532978197
Mean90.68761906
Median Absolute Deviation (MAD)46
Skewness2.319417732
Sum48892144
Variance6598.862622
MonotonicityNot monotonic
2022-05-23T11:11:56.145200image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
59962
 
1.8%
119838
 
1.8%
39241
 
1.7%
219236
 
1.7%
18843
 
1.6%
278721
 
1.6%
258571
 
1.6%
198457
 
1.6%
138166
 
1.5%
158099
 
1.5%
Other values (262)449993
83.5%
ValueCountFrequency (%)
18843
1.6%
2388
 
0.1%
39241
1.7%
4149
 
< 0.1%
59962
1.8%
619
 
< 0.1%
76989
1.3%
97633
1.4%
119838
1.8%
1290
 
< 0.1%
ValueCountFrequency (%)
810148
< 0.1%
800257
< 0.1%
55075
 
< 0.1%
5103
 
< 0.1%
507219
< 0.1%
50530
 
< 0.1%
503161
< 0.1%
501361
0.1%
49937
 
< 0.1%
497140
 
< 0.1%

Crop_Code
Real number (ℝ≥0)

HIGH CORRELATION

Distinct281
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean405.2536026
Minimum1
Maximum9999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:56.427958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q167
median102
Q3158
95-th percentile1218
Maximum9999
Range9998
Interquartile range (IQR)91

Descriptive statistics

Standard deviation1239.064331
Coefficient of variation (CV)3.057503555
Kurtosis24.39880847
Mean405.2536026
Median Absolute Deviation (MAD)49
Skewness4.964466117
Sum218483159
Variance1535280.415
MonotonicityNot monotonic
2022-05-23T11:11:56.617765image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10291596
17.0%
9962200
 
11.5%
29652333
 
9.7%
4128757
 
5.3%
1121741
 
4.0%
1614181
 
2.6%
5313462
 
2.5%
8112029
 
2.2%
2710637
 
2.0%
9410573
 
2.0%
Other values (271)221618
41.1%
ValueCountFrequency (%)
1572
 
0.1%
2512
 
0.1%
349
 
< 0.1%
485
 
< 0.1%
5144
 
< 0.1%
726
 
< 0.1%
866
 
< 0.1%
929
 
< 0.1%
1059
 
< 0.1%
1121741
4.0%
ValueCountFrequency (%)
99992
 
< 0.1%
999824
 
< 0.1%
99976
 
< 0.1%
999610
 
< 0.1%
99953
 
< 0.1%
999420
 
< 0.1%
99933
 
< 0.1%
99922
 
< 0.1%
99073
 
< 0.1%
990668
< 0.1%

State
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct55
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
Texas
 
35657
Minnesota
 
26281
Nebraska
 
26132
Iowa
 
25414
Kansas
 
22637
Other values (50)
403006 

Length

Max length26
Median length14
Mean length8.088205562
Min length4

Characters and Unicode

Total characters4360570
Distinct characters47
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlabama
2nd rowAlabama
3rd rowAlabama
4th rowAlabama
5th rowAlabama

Common Values

ValueCountFrequency (%)
Texas35657
 
6.6%
Minnesota26281
 
4.9%
Nebraska26132
 
4.8%
Iowa25414
 
4.7%
Kansas22637
 
4.2%
Wisconsin22074
 
4.1%
Illinois19790
 
3.7%
Ohio18846
 
3.5%
Michigan18214
 
3.4%
Missouri18123
 
3.4%
Other values (45)305959
56.8%

Length

2022-05-23T11:11:56.805819image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
texas35657
 
5.7%
dakota31265
 
5.0%
north30962
 
4.9%
minnesota26281
 
4.2%
nebraska26132
 
4.2%
iowa25414
 
4.0%
south25139
 
4.0%
carolina24836
 
4.0%
new24641
 
3.9%
kansas22637
 
3.6%
Other values (54)354890
56.5%

Most occurring characters

ValueCountFrequency (%)
a604115
13.9%
i409135
 
9.4%
n402282
 
9.2%
o398700
 
9.1%
s345588
 
7.9%
e236518
 
5.4%
r206628
 
4.7%
t172292
 
4.0%
h134213
 
3.1%
l134211
 
3.1%
Other values (37)1316888
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3643658
83.6%
Uppercase Letter627523
 
14.4%
Space Separator88727
 
2.0%
Other Punctuation662
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a604115
16.6%
i409135
11.2%
n402282
11.0%
o398700
10.9%
s345588
9.5%
e236518
 
6.5%
r206628
 
5.7%
t172292
 
4.7%
h134213
 
3.7%
l134211
 
3.7%
Other values (14)599976
16.5%
Uppercase Letter
ValueCountFrequency (%)
M100864
16.1%
N82846
13.2%
I74945
11.9%
C46281
7.4%
T44765
7.1%
O42356
6.7%
W36610
 
5.8%
K35711
 
5.7%
D32321
 
5.2%
S25485
 
4.1%
Other values (11)105339
16.8%
Space Separator
ValueCountFrequency (%)
88727
100.0%
Other Punctuation
ValueCountFrequency (%)
.662
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4271181
98.0%
Common89389
 
2.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a604115
14.1%
i409135
 
9.6%
n402282
 
9.4%
o398700
 
9.3%
s345588
 
8.1%
e236518
 
5.5%
r206628
 
4.8%
t172292
 
4.0%
h134213
 
3.1%
l134211
 
3.1%
Other values (35)1227499
28.7%
Common
ValueCountFrequency (%)
88727
99.3%
.662
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4360570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a604115
13.9%
i409135
 
9.4%
n402282
 
9.2%
o398700
 
9.1%
s345588
 
7.9%
e236518
 
5.4%
r206628
 
4.7%
t172292
 
4.0%
h134213
 
3.1%
l134211
 
3.1%
Other values (37)1316888
30.2%

County
Categorical

HIGH CARDINALITY

Distinct1808
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
Washington
 
5964
Franklin
 
4630
Jackson
 
4389
Jefferson
 
4262
Lincoln
 
3798
Other values (1803)
516084 

Length

Max length44
Median length33
Mean length7.091462679
Min length3

Characters and Unicode

Total characters3823199
Distinct characters59
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowAutauga
2nd rowAutauga
3rd rowAutauga
4th rowAutauga
5th rowAutauga

Common Values

ValueCountFrequency (%)
Washington5964
 
1.1%
Franklin4630
 
0.9%
Jackson4389
 
0.8%
Jefferson4262
 
0.8%
Lincoln3798
 
0.7%
Madison3243
 
0.6%
Adams3057
 
0.6%
Marion2909
 
0.5%
Monroe2842
 
0.5%
Clay2813
 
0.5%
Other values (1798)501220
93.0%

Length

2022-05-23T11:11:56.951726image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
washington6053
 
1.0%
franklin4824
 
0.8%
jefferson4406
 
0.8%
jackson4389
 
0.8%
st3824
 
0.7%
lincoln3798
 
0.6%
madison3243
 
0.6%
adams3057
 
0.5%
monroe3013
 
0.5%
marion2909
 
0.5%
Other values (1823)545485
93.2%

Most occurring characters

ValueCountFrequency (%)
e372853
 
9.8%
a372229
 
9.7%
n317793
 
8.3%
o292720
 
7.7%
r259493
 
6.8%
l213885
 
5.6%
i197175
 
5.2%
t176418
 
4.6%
s170300
 
4.5%
u100682
 
2.6%
Other values (49)1349651
35.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3169909
82.9%
Uppercase Letter591341
 
15.5%
Space Separator45874
 
1.2%
Other Punctuation13957
 
0.4%
Decimal Number2118
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e372853
11.8%
a372229
11.7%
n317793
10.0%
o292720
9.2%
r259493
 
8.2%
l213885
 
6.7%
i197175
 
6.2%
t176418
 
5.6%
s170300
 
5.4%
u100682
 
3.2%
Other values (16)696361
22.0%
Uppercase Letter
ValueCountFrequency (%)
C68369
 
11.6%
M53966
 
9.1%
S48147
 
8.1%
B44974
 
7.6%
W41252
 
7.0%
L38958
 
6.6%
P34313
 
5.8%
H33777
 
5.7%
G26514
 
4.5%
D26152
 
4.4%
Other values (15)174919
29.6%
Other Punctuation
ValueCountFrequency (%)
,6724
48.2%
.4056
29.1%
&1059
 
7.6%
#1059
 
7.6%
;1059
 
7.6%
Decimal Number
ValueCountFrequency (%)
31059
50.0%
91059
50.0%
Space Separator
ValueCountFrequency (%)
45874
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3761250
98.4%
Common61949
 
1.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e372853
 
9.9%
a372229
 
9.9%
n317793
 
8.4%
o292720
 
7.8%
r259493
 
6.9%
l213885
 
5.7%
i197175
 
5.2%
t176418
 
4.7%
s170300
 
4.5%
u100682
 
2.7%
Other values (41)1287702
34.2%
Common
ValueCountFrequency (%)
45874
74.1%
,6724
 
10.9%
.4056
 
6.5%
&1059
 
1.7%
#1059
 
1.7%
31059
 
1.7%
91059
 
1.7%
;1059
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII3823199
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e372853
 
9.8%
a372229
 
9.7%
n317793
 
8.3%
o292720
 
7.7%
r259493
 
6.8%
l213885
 
5.6%
i197175
 
5.2%
t176418
 
4.6%
s170300
 
4.5%
u100682
 
2.6%
Other values (49)1349651
35.3%

State_County_Code
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3069
Distinct (%)0.6%
Missing423
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean31111.80045
Minimum1001
Maximum72141
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:57.158205image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile8007
Q119181
median30075
Q342039
95-th percentile54089
Maximum72141
Range71140
Interquartile range (IQR)22858

Descriptive statistics

Standard deviation14046.55323
Coefficient of variation (CV)0.4514863503
Kurtosis-0.8269182426
Mean31111.80045
Median Absolute Deviation (MAD)10968
Skewness-0.02568760535
Sum1.676005135 × 1010
Variance197305657.8
MonotonicityNot monotonic
2022-05-23T11:11:57.502725image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26021931
 
0.2%
27037854
 
0.2%
27145761
 
0.1%
41047709
 
0.1%
8123706
 
0.1%
36067701
 
0.1%
41059699
 
0.1%
48445693
 
0.1%
30013656
 
0.1%
55021649
 
0.1%
Other values (3059)531345
98.6%
ValueCountFrequency (%)
1001247
< 0.1%
1003274
0.1%
1005192
< 0.1%
100761
 
< 0.1%
1009219
< 0.1%
1011108
 
< 0.1%
1013105
 
< 0.1%
1015121
< 0.1%
1017142
< 0.1%
1019172
< 0.1%
ValueCountFrequency (%)
7214181
< 0.1%
72113100
< 0.1%
72097187
< 0.1%
7208170
 
< 0.1%
7204777
< 0.1%
7202574
 
< 0.1%
72019110
< 0.1%
72013133
< 0.1%
7200165
 
< 0.1%
6911032
 
< 0.1%

Crop
Categorical

HIGH CARDINALITY

Distinct283
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
GRASS
91596 
CRP
62200 
MIXED FORAGE
52333 
CORN
28757 
WHEAT
 
21741
Other values (278)
282500 

Length

Max length34
Median length27
Mean length7.013575651
Min length3

Characters and Unicode

Total characters3781208
Distinct characters33
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st rowWHEAT
2nd rowWHEAT
3rd rowOATS
4th rowOATS
5th rowOATS

Common Values

ValueCountFrequency (%)
GRASS91596
17.0%
CRP62200
 
11.5%
MIXED FORAGE52333
 
9.7%
CORN28757
 
5.3%
WHEAT21741
 
4.0%
OATS14181
 
2.6%
GRAPES13462
 
2.5%
SOYBEANS12029
 
2.2%
ALFALFA10637
 
2.0%
RYE10573
 
2.0%
Other values (273)221618
41.1%

Length

2022-05-23T11:11:57.752527image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
grass91596
 
14.0%
crp62200
 
9.5%
forage61498
 
9.4%
mixed52333
 
8.0%
corn28757
 
4.4%
wheat21741
 
3.3%
sorghum17751
 
2.7%
oats14181
 
2.2%
grapes13462
 
2.1%
soybeans12029
 
1.8%
Other values (315)281028
42.8%

Most occurring characters

ValueCountFrequency (%)
R434693
 
11.5%
E404263
 
10.7%
S400081
 
10.6%
A378609
 
10.0%
O257467
 
6.8%
G208296
 
5.5%
P170053
 
4.5%
C162592
 
4.3%
T138632
 
3.7%
L135522
 
3.6%
Other values (23)1091000
28.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3647301
96.5%
Space Separator131754
 
3.5%
Other Punctuation954
 
< 0.1%
Open Punctuation584
 
< 0.1%
Close Punctuation584
 
< 0.1%
Decimal Number31
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R434693
11.9%
E404263
11.1%
S400081
11.0%
A378609
 
10.4%
O257467
 
7.1%
G208296
 
5.7%
P170053
 
4.7%
C162592
 
4.5%
T138632
 
3.8%
L135522
 
3.7%
Other values (16)957093
26.2%
Decimal Number
ValueCountFrequency (%)
011
35.5%
310
32.3%
110
32.3%
Space Separator
ValueCountFrequency (%)
131754
100.0%
Other Punctuation
ValueCountFrequency (%)
/954
100.0%
Open Punctuation
ValueCountFrequency (%)
(584
100.0%
Close Punctuation
ValueCountFrequency (%)
)584
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3647301
96.5%
Common133907
 
3.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
R434693
11.9%
E404263
11.1%
S400081
11.0%
A378609
 
10.4%
O257467
 
7.1%
G208296
 
5.7%
P170053
 
4.7%
C162592
 
4.5%
T138632
 
3.8%
L135522
 
3.7%
Other values (16)957093
26.2%
Common
ValueCountFrequency (%)
131754
98.4%
/954
 
0.7%
(584
 
0.4%
)584
 
0.4%
011
 
< 0.1%
310
 
< 0.1%
110
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3781208
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R434693
 
11.5%
E404263
 
10.7%
S400081
 
10.6%
A378609
 
10.0%
O257467
 
6.8%
G208296
 
5.5%
P170053
 
4.5%
C162592
 
4.3%
T138632
 
3.7%
L135522
 
3.6%
Other values (23)1091000
28.9%

Crop_Type
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing98932
Missing (%)18.4%
Memory size4.1 MiB

Intended_Use
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct29
Distinct (%)< 0.1%
Missing25340
Missing (%)4.7%
Memory size4.1 MiB
Forage
107064 
Fresh
95976 
Blank
81833 
Grazing
69946 
Grain
61445 
Other values (24)
97523 

Length

Max length14
Median length13
Mean length6.227061019
Min length3

Characters and Unicode

Total characters3199383
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowGrain
2nd rowSeed
3rd rowGrazing
4th rowGrain
5th rowGrazing

Common Values

ValueCountFrequency (%)
Forage107064
19.9%
Fresh95976
17.8%
Blank81833
15.2%
Grazing69946
13.0%
Grain61445
11.4%
Left Standing28253
 
5.2%
Processed19405
 
3.6%
Seed18155
 
3.4%
Cover Only8258
 
1.5%
Silage5589
 
1.0%
Other values (19)17863
 
3.3%
(Missing)25340
 
4.7%

Length

2022-05-23T11:11:57.995065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
forage107064
19.1%
fresh95976
17.1%
blank81833
14.6%
grazing69946
12.5%
grain61445
11.0%
left28253
 
5.0%
standing28253
 
5.0%
processed19405
 
3.5%
seed18155
 
3.2%
cover8258
 
1.5%
Other values (23)41272
 
7.4%

Most occurring characters

ValueCountFrequency (%)
r373621
11.7%
a362005
11.3%
e338022
10.6%
n285494
 
8.9%
g210876
 
6.6%
F203238
 
6.4%
i171920
 
5.4%
o138854
 
4.3%
s137461
 
4.3%
G135211
 
4.2%
Other values (28)842681
26.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2589690
80.9%
Uppercase Letter562916
 
17.6%
Space Separator46073
 
1.4%
Other Punctuation704
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r373621
14.4%
a362005
14.0%
e338022
13.1%
n285494
11.0%
g210876
8.1%
i171920
6.6%
o138854
 
5.4%
s137461
 
5.3%
l103688
 
4.0%
h95976
 
3.7%
Other values (11)371773
14.4%
Uppercase Letter
ValueCountFrequency (%)
F203238
36.1%
G135211
24.0%
B83009
14.7%
S55268
 
9.8%
L28273
 
5.0%
P21457
 
3.8%
C9434
 
1.7%
O8665
 
1.5%
D6514
 
1.2%
E5377
 
1.0%
Other values (5)6470
 
1.1%
Space Separator
ValueCountFrequency (%)
46073
100.0%
Other Punctuation
ValueCountFrequency (%)
/704
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3152606
98.5%
Common46777
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r373621
11.9%
a362005
11.5%
e338022
10.7%
n285494
 
9.1%
g210876
 
6.7%
F203238
 
6.4%
i171920
 
5.5%
o138854
 
4.4%
s137461
 
4.4%
G135211
 
4.3%
Other values (26)795904
25.2%
Common
ValueCountFrequency (%)
46073
98.5%
/704
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3199383
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r373621
11.7%
a362005
11.3%
e338022
10.6%
n285494
 
8.9%
g210876
 
6.6%
F203238
 
6.4%
i171920
 
5.4%
o138854
 
4.3%
s137461
 
4.3%
G135211
 
4.2%
Other values (28)842681
26.3%

Irrigation_Practice
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
N
406616 
I
129195 
O
 
3316

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters539127
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN
2nd rowN
3rd rowN
4th rowN
5th rowN

Common Values

ValueCountFrequency (%)
N406616
75.4%
I129195
 
24.0%
O3316
 
0.6%

Length

2022-05-23T11:11:58.133554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-23T11:11:58.391670image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
n406616
75.4%
i129195
 
24.0%
o3316
 
0.6%

Most occurring characters

ValueCountFrequency (%)
N406616
75.4%
I129195
 
24.0%
O3316
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter539127
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N406616
75.4%
I129195
 
24.0%
O3316
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Latin539127
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N406616
75.4%
I129195
 
24.0%
O3316
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII539127
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N406616
75.4%
I129195
 
24.0%
O3316
 
0.6%

Planted_Acres
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct150912
Distinct (%)28.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3775.961195
Minimum0
Maximum6914872.27
Zeros17658
Zeros (%)3.3%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:58.718250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.065
Q16.01
median51.2
Q3365.42
95-th percentile7972.457
Maximum6914872.27
Range6914872.27
Interquartile range (IQR)359.41

Descriptive statistics

Standard deviation45591.58925
Coefficient of variation (CV)12.07416785
Kurtosis4855.667575
Mean3775.961195
Median Absolute Deviation (MAD)50.7
Skewness54.10435986
Sum2035722631
Variance2078593011
MonotonicityNot monotonic
2022-05-23T11:11:58.943718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
017658
 
3.3%
15398
 
1.0%
0.53883
 
0.7%
23873
 
0.7%
0.12910
 
0.5%
32432
 
0.5%
52314
 
0.4%
0.252091
 
0.4%
41951
 
0.4%
1.51931
 
0.4%
Other values (150902)494686
91.8%
ValueCountFrequency (%)
017658
3.3%
0.000148
 
< 0.1%
0.000216
 
< 0.1%
0.000310
 
< 0.1%
0.00045
 
< 0.1%
0.00057
 
< 0.1%
0.000624
 
< 0.1%
0.000711
 
< 0.1%
0.00088
 
< 0.1%
0.00099
 
< 0.1%
ValueCountFrequency (%)
6914872.271
< 0.1%
6128962.121
< 0.1%
5578955.551
< 0.1%
5533689.11
< 0.1%
5461513.271
< 0.1%
5424527.571
< 0.1%
4754721.091
< 0.1%
4315211.081
< 0.1%
4230149.131
< 0.1%
4226050.61
< 0.1%

Volunteer_Acres
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct14413
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean122.2206267
Minimum0
Maximum3205029.9
Zeros510568
Zeros (%)94.7%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:59.149806image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1.38
Maximum3205029.9
Range3205029.9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation10725.21554
Coefficient of variation (CV)87.75290912
Kurtosis52098.19125
Mean122.2206267
Median Absolute Deviation (MAD)0
Skewness210.2360612
Sum65892439.79
Variance115030248.5
MonotonicityNot monotonic
2022-05-23T11:11:59.506531image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0510568
94.7%
1124
 
< 0.1%
2124
 
< 0.1%
399
 
< 0.1%
591
 
< 0.1%
488
 
< 0.1%
1082
 
< 0.1%
0.573
 
< 0.1%
1.554
 
< 0.1%
0.642
 
< 0.1%
Other values (14403)27782
 
5.2%
ValueCountFrequency (%)
0510568
94.7%
0.0011
 
< 0.1%
0.0022
 
< 0.1%
0.011
 
< 0.1%
0.035
 
< 0.1%
0.03991
 
< 0.1%
0.046
 
< 0.1%
0.0461
 
< 0.1%
0.055
 
< 0.1%
0.0611
 
< 0.1%
ValueCountFrequency (%)
3205029.91
< 0.1%
2943698.031
< 0.1%
2804325.51
< 0.1%
2774579.41
< 0.1%
2576408.9471
< 0.1%
1661772.111
< 0.1%
1618524.6671
< 0.1%
1194009.021
< 0.1%
1119797.441
< 0.1%
1114766.571
< 0.1%

Failed_Acres
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct6835
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.63542055
Minimum0
Maximum245694.68
Zeros529907
Zeros (%)98.3%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:59.659536image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum245694.68
Range245694.68
Interquartile range (IQR)0

Descriptive statistics

Standard deviation908.3418301
Coefficient of variation (CV)58.0951326
Kurtosis25499.62461
Mean15.63542055
Median Absolute Deviation (MAD)0
Skewness138.0075859
Sum8429477.374
Variance825084.8803
MonotonicityNot monotonic
2022-05-23T11:11:59.853907image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0529907
98.3%
1051
 
< 0.1%
250
 
< 0.1%
545
 
< 0.1%
145
 
< 0.1%
1543
 
< 0.1%
3039
 
< 0.1%
437
 
< 0.1%
337
 
< 0.1%
2035
 
< 0.1%
Other values (6825)8838
 
1.6%
ValueCountFrequency (%)
0529907
98.3%
0.00041
 
< 0.1%
0.00110
 
< 0.1%
0.0022
 
< 0.1%
0.0031
 
< 0.1%
0.00461
 
< 0.1%
0.00510
 
< 0.1%
0.00571
 
< 0.1%
0.015
 
< 0.1%
0.01011
 
< 0.1%
ValueCountFrequency (%)
245694.681
< 0.1%
228193.031
< 0.1%
135270.3251
< 0.1%
131140.461
< 0.1%
127969.011
< 0.1%
123209.771
< 0.1%
122006.881
< 0.1%
121225.811
< 0.1%
118441.241
< 0.1%
115055.021
< 0.1%

Prevented_Acres
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct14217
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60.16144427
Minimum0
Maximum259649.49
Zeros521451
Zeros (%)96.7%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:11:59.997094image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum259649.49
Range259649.49
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1377.104224
Coefficient of variation (CV)22.89014569
Kurtosis8367.386719
Mean60.16144427
Median Absolute Deviation (MAD)0
Skewness72.64528079
Sum32434658.96
Variance1896416.044
MonotonicityNot monotonic
2022-05-23T11:12:00.145264image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0521451
96.7%
586
 
< 0.1%
273
 
< 0.1%
169
 
< 0.1%
1066
 
< 0.1%
2060
 
< 0.1%
359
 
< 0.1%
458
 
< 0.1%
0.547
 
< 0.1%
641
 
< 0.1%
Other values (14207)17117
 
3.2%
ValueCountFrequency (%)
0521451
96.7%
0.00111
 
< 0.1%
0.00572
 
< 0.1%
0.018
 
< 0.1%
0.022
 
< 0.1%
0.032
 
< 0.1%
0.045
 
< 0.1%
0.062
 
< 0.1%
0.0751
 
< 0.1%
0.113
 
< 0.1%
ValueCountFrequency (%)
259649.491
< 0.1%
221519.631
< 0.1%
208771.0051
< 0.1%
186101.961
< 0.1%
168317.731
< 0.1%
167046.41
< 0.1%
166986.511
< 0.1%
154883.751
< 0.1%
128749.431
< 0.1%
128401.031
< 0.1%

Not_Planted_Acres
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct10278
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.3765942
Minimum0
Maximum435738.97
Zeros522162
Zeros (%)96.9%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:12:00.366602image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum435738.97
Range435738.97
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2008.62201
Coefficient of variation (CV)42.3969271
Kurtosis22337.2707
Mean47.3765942
Median Absolute Deviation (MAD)0
Skewness131.4743886
Sum25542001.1
Variance4034562.381
MonotonicityNot monotonic
2022-05-23T11:12:00.535859image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0522162
96.9%
2122
 
< 0.1%
1114
 
< 0.1%
4106
 
< 0.1%
389
 
< 0.1%
680
 
< 0.1%
580
 
< 0.1%
1060
 
< 0.1%
857
 
< 0.1%
754
 
< 0.1%
Other values (10268)16203
 
3.0%
ValueCountFrequency (%)
0522162
96.9%
0.00251
 
< 0.1%
0.0041
 
< 0.1%
0.013
 
< 0.1%
0.0121
 
< 0.1%
0.021
 
< 0.1%
0.033
 
< 0.1%
0.044
 
< 0.1%
0.054
 
< 0.1%
0.062
 
< 0.1%
ValueCountFrequency (%)
435738.971
< 0.1%
435141.361
< 0.1%
429022.691
< 0.1%
386085.441
< 0.1%
380418.931
< 0.1%
359695.391
< 0.1%
223938.881
< 0.1%
222664.421
< 0.1%
216260.311
< 0.1%
211978.481
< 0.1%

Planted_and_Failed_Acres
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct151418
Distinct (%)28.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3791.596615
Minimum0
Maximum6914872.27
Zeros17116
Zeros (%)3.2%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2022-05-23T11:12:00.693457image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.07
Q16.08
median51.45
Q3366.375
95-th percentile8009.723
Maximum6914872.27
Range6914872.27
Interquartile range (IQR)360.295

Descriptive statistics

Standard deviation45619.42721
Coefficient of variation (CV)12.03171957
Kurtosis4843.861243
Mean3791.596615
Median Absolute Deviation (MAD)50.95
Skewness54.01257506
Sum2044152108
Variance2081132139
MonotonicityNot monotonic
2022-05-23T11:12:01.279324image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
017116
 
3.2%
15405
 
1.0%
0.53884
 
0.7%
23869
 
0.7%
0.12923
 
0.5%
32437
 
0.5%
52318
 
0.4%
0.252091
 
0.4%
41955
 
0.4%
1.51933
 
0.4%
Other values (151408)495196
91.9%
ValueCountFrequency (%)
017116
3.2%
0.000148
 
< 0.1%
0.000216
 
< 0.1%
0.000310
 
< 0.1%
0.00046
 
< 0.1%
0.00057
 
< 0.1%
0.000624
 
< 0.1%
0.000711
 
< 0.1%
0.00088
 
< 0.1%
0.00099
 
< 0.1%
ValueCountFrequency (%)
6914872.271
< 0.1%
6128962.121
< 0.1%
5578955.551
< 0.1%
5533689.11
< 0.1%
5461513.271
< 0.1%
5424527.571
< 0.1%
4754721.091
< 0.1%
4315211.081
< 0.1%
4230149.131
< 0.1%
4226050.61
< 0.1%

Crop_Year
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
2020
185797 
2019
185028 
2018
168302 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2156508
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018
2nd row2018
3rd row2018
4th row2018
5th row2018

Common Values

ValueCountFrequency (%)
2020185797
34.5%
2019185028
34.3%
2018168302
31.2%

Length

2022-05-23T11:12:01.418260image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-23T11:12:01.524546image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2020185797
34.5%
2019185028
34.3%
2018168302
31.2%

Most occurring characters

ValueCountFrequency (%)
2724924
33.6%
0724924
33.6%
1353330
16.4%
9185028
 
8.6%
8168302
 
7.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2156508
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2724924
33.6%
0724924
33.6%
1353330
16.4%
9185028
 
8.6%
8168302
 
7.8%

Most occurring scripts

ValueCountFrequency (%)
Common2156508
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2724924
33.6%
0724924
33.6%
1353330
16.4%
9185028
 
8.6%
8168302
 
7.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII2156508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2724924
33.6%
0724924
33.6%
1353330
16.4%
9185028
 
8.6%
8168302
 
7.8%

Interactions

2022-05-23T11:11:41.887456image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:16.621809image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:19.766144image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:21.976814image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:24.189412image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:26.754975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:32.828932image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:34.913568image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:37.318337image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:39.265453image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:42.334860image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:17.353520image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:20.017250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:22.206515image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:24.442979image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:30.680077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:33.021722image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:35.152269image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:37.514817image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:39.438727image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:42.676444image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:17.675701image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:20.265780image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:22.425246image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:24.678462image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:31.032431image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:33.223199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:35.517215image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:37.704885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:39.633186image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:43.024923image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:17.951210image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:20.485493image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:22.620707image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:24.961347image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:31.251268image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:33.414370image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:35.768572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:37.905915image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:39.852244image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:43.403862image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:18.193864image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:20.691867image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:22.887599image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:25.250615image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:31.458789image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:33.611742image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:35.980872image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:38.120895image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:40.047868image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:43.658837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:18.467248image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:20.890022image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:23.107068image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:25.550832image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:31.657572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:33.813249image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:36.202362image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:38.316362image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:40.237009image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:43.888603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:18.728443image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:21.108409image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:23.316239image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:25.788653image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:31.897200image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:34.010583image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:36.409927image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:38.509026image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:40.495936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:44.105592image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:19.027902image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:21.315591image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:23.524456image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:26.035622image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:32.127162image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:34.253634image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:36.625653image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:38.683734image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:40.816330image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:44.315096image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:19.259840image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:21.527408image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:23.768970image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:26.266307image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:32.415804image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:34.459188image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:36.849528image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:38.873184image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:41.060153image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:44.508395image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:19.513752image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:21.730905image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:23.967884image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:26.482574image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:32.607183image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:34.643700image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:37.072330image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:39.065313image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-23T11:11:41.585547image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-05-23T11:12:01.613765image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-23T11:12:01.810527image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-23T11:12:01.956975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-23T11:12:02.117830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-23T11:12:02.231886image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-23T11:11:45.359368image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-23T11:11:47.810364image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-23T11:11:53.443456image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-23T11:11:54.504810image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

State_CodeCounty_CodeCrop_CodeStateCountyState_County_CodeCropCrop_TypeIntended_UseIrrigation_PracticePlanted_AcresVolunteer_AcresFailed_AcresPrevented_AcresNot_Planted_AcresPlanted_and_Failed_AcresCrop_Year
01111AlabamaAutauga1001.0WHEATSOFT RED WINTERGrainN161.000.00.00.00.0161.002018
11111AlabamaAutauga1001.0WHEATSOFT RED WINTERSeedN26.780.00.00.00.026.782018
21116AlabamaAutauga1001.0OATSHULLESS WINTERGrazingN47.250.00.00.00.047.252018
31116AlabamaAutauga1001.0OATSWINTERGrainN166.190.00.00.00.0166.192018
41116AlabamaAutauga1001.0OATSWINTERGrazingN3.370.00.00.00.03.372018
51116AlabamaAutauga1001.0OATSWINTERSeedN22.320.00.00.00.022.322018
61121AlabamaAutauga1001.0COTTON UPLANDNaNNaNI794.800.00.00.00.0794.802018
71121AlabamaAutauga1001.0COTTON UPLANDNaNNaNN8359.270.00.00.00.08359.272018
81134AlabamaAutauga1001.0PEACHESCLING PEACHESFreshN4.000.00.00.00.04.002018
91134AlabamaAutauga1001.0PEACHESFREESTONE LATE SEASONFreshN5.000.00.00.00.05.002018

Last rows

State_CodeCounty_CodeCrop_CodeStateCountyState_County_CodeCropCrop_TypeIntended_UseIrrigation_PracticePlanted_AcresVolunteer_AcresFailed_AcresPrevented_AcresNot_Planted_AcresPlanted_and_Failed_AcresCrop_Year
539117721411010Puerto RicoUtuado72141.0NURSERYCONTAINERBlankI1.98000.00.00.00.001.98002020
539118721411010Puerto RicoUtuado72141.0NURSERYEDIBLE CONTAINERBlankI0.08500.00.00.00.000.08502020
539119721411166Puerto RicoUtuado72141.0CAIMITONaNFreshN0.97000.00.00.00.000.97002020
539120721411167Puerto RicoUtuado72141.0GUAMABANA/SOURSOPNaNFreshN0.48560.00.00.00.000.48562020
539121721411190Puerto RicoUtuado72141.0HONEYNaNFreshO0.00000.00.00.00.250.00002020
539122721411290Puerto RicoUtuado72141.0BREADFRUITNaNFreshN54.88000.00.00.00.0054.88002020
539123721417037Puerto RicoUtuado72141.0JACK FRUITNaNFreshN1.94000.00.00.00.001.94002020
539124721417164Puerto RicoUtuado72141.0RAMBUTANNaNFreshN10.67240.00.00.00.0010.67242020
539125721417208Puerto RicoUtuado72141.0MANGOSTEENNaNFreshN8.73000.00.00.00.008.73002020
539126721418005Puerto RicoUtuado72141.0LYCHEENaNFreshN0.97000.00.00.00.000.97002020

Duplicate rows

Most frequently occurring

State_CodeCounty_CodeCrop_CodeStateCountyState_County_CodeCropIntended_UseIrrigation_PracticePlanted_AcresVolunteer_AcresFailed_AcresPrevented_AcresNot_Planted_AcresPlanted_and_Failed_AcresCrop_Year# duplicates
22443777501North CarolinaAnson37007.0FLOWERSFreshI0.01000.00.00.00.00.0100202040
255339277501OhioClinton39027.0FLOWERSFreshN0.01000.00.00.00.00.0100201817
624181815000IndianaWhite18181.0HERBSFreshN0.00230.00.00.00.00.0023202016
263239637501OhioHancock39063.0FLOWERSFreshI0.00200.00.00.00.00.0020201815
2506383153North DakotaFoster38031.0GRAPESFreshN0.01740.00.00.00.00.0174201813
1952352553New MexicoLea35025.0GRAPESFreshI0.09000.00.00.00.00.0900201812
1953352553New MexicoLea35025.0GRAPESFreshI0.09000.00.00.00.00.0900201912
1954352553New MexicoLea35025.0GRAPESFreshI0.09000.00.00.00.00.0900202012
2504383153North DakotaFoster38031.0GRAPESFreshN0.01300.00.00.00.00.0130201912
2505383153North DakotaFoster38031.0GRAPESFreshN0.01300.00.00.00.00.0130202012